Goodness-of-Fit Measures for Induction Trees
نویسندگان
چکیده
This paper is concerned with the goodness-of-fit of induced decision trees. Namely, we explore the possibility to measure the goodnessof-fit as it is classically done in statistical modeling. We show how Chisquare statistics and especially the Log-likelihood Ratio statistic that is abundantly used in the modeling of cross tables, can be adapted for induction trees. Not only is the Log-likelihood Ratio statistic suited for testing the goodness-of-fit. It allows also to test the significance of the fit between two nested trees. In addition, we derive from it pseudo R’s. We propose also adapted forms of the Akaike (AIC) and Bayesian (BIC) information criteria that prove useful in selecting the best compromise model between fit and complexity.
منابع مشابه
Flood Hydrograph Simulation with Uncertainty in Rainfall - Runoff Parameters
Flood hydrograph simulation is affected by uncertainty in Rainfall – Runoff )RR( parameters. Uncertainty of RR parameters in Gharasoo catchment, part of the great Karkheh river basin, is evaluated by Monte–Carlo (MC) approach. A conceptual-distributed model, called ModClark, was used for basin simulation, in which the basin’s hydrograph was determined using the superposition of runoff generated...
متن کاملFlood Hydrograph Simulation with Uncertainty in Rainfall - Runoff Parameters
Flood hydrograph simulation is affected by uncertainty in Rainfall – Runoff )RR( parameters. Uncertainty of RR parameters in Gharasoo catchment, part of the great Karkheh river basin, is evaluated by Monte–Carlo (MC) approach. A conceptual-distributed model, called ModClark, was used for basin simulation, in which the basin’s hydrograph was determined using the superposition of runoff generated...
متن کاملStatistical Preprocessing for Decision Tree Induction
Some apparently simple numeric data sets cause signiicant problems for existing decision tree induction algorithms, in that no method is able to nd a small, accurate tree, even though one exists. One source of this diiculty is the goodness measures used to decide whether a particular node represents a good way to split the data. This paper points out that the commonly-used goodness measures are...
متن کاملTreeSAAP: Selection on Amino Acid Properties using phylogenetic trees
The software program TreeSAAP measures the selective influences on 31 structural and biochemical amino acid properties during cladogenesis, and performs goodness-of-fit and categorical statistical tests.
متن کاملThe Comparison Between Goodness of Fit Tests for Copula
Copula functions as a model can show the relationship between variables. Appropriate copula function for a specific application is a function that shows the dependency between data in a best way. Goodness of fit tests theoretically are the best way in selection of copula function. Different ways of goodness of fit for copula exist. In this paper we will examine the goodness of fit test...
متن کامل